Live Website Traffic Analysis Integrated with Improved Performance for Small Files using Hadoop
نویسنده
چکیده
Hadoop, an open source java framework deals with big data. It has HDFS (Hadoop distributed file system) and MapReduce. HDFS is designed to handle large amount files through clusters and suffers performance penalty while dealing with large number of small files. These large numbers of small files pose a heavy burden on the NameNode of HDFS and an increase execution time for MapReduce. Secondly, as an application part Traffic analyzer implemented with the combination of Hadoop and Map-Reduce paradigm, which makes it possible to analyse the any website programmatically. A web ranking metric, web analytics or simply web measurement refers to a system used to measure factors that affect a website‘s exposure and traffic on the web. The proposed approach is done to handle small files. In proposed approach, ―merging‖ of small file is done using MapReduce programming model on Hadoop. This approach improves the performance of Hadoop in handling of small files and also reduces the memory required by NameNode to store them. Traffic analysis gives the rank, number of views, visitors, index number so on any website which indicates the true analysis of the website in frequent basis using Hadoop. Keywords— MapReduce; Hadoop; HDFS; Small Files; Traffic Analyzer; A.Hadoop
منابع مشابه
An Efficient Approach to Optimize the Performance of Massive Small Files in Hadoop MapReduce Framework
The most popular open source distributed computing framework called Hadoop was designed by Doug Cutting and his team, which involves thousands of nodes to process and analyze huge amounts of data called Big Data. The major core components of Hadoop are HDFS (Hadoop Distributed File System) and MapReduce. This framework is the most popular and powerful for store, manage and process Big Data appl...
متن کاملAn Improved Approach for Analysis of Hadoop Data for All Files
Here in this paper an efficient Framework is implemented for Hadoop Platform for almost all types of Files. The Proposed Methodology implemented here is based on various algorithms implemented on Hadoop Platform such as Scan, Read, Sort etc. Various Workloads are used for the Analysis of the Algorithms of small and big size such as Facebook, Co-author, and Twitter. The Experimental results show...
متن کاملImproving the Performance of Processing for Small Files in Hadoop: A Case Study of Weather Data Analytics
-Hadoop is an open source Apache project that supports master slave architecture, which involves one master node and thousands of slave nodes. Master node acts as the name node, which stores all the metadata of files and slave nodes acts as the data nodes, which stores all the application data. Hadoop is designed to process large data sets (petabytes). It becomes a bottleneck, when handling mas...
متن کاملA Method to Improve the Performance for Storing Massive Small Files in Hadoop
As a new open source project, Hadoop provides a new way to store massive data. Because of high scalability, low cost, good flexibility, high speed and strong fault tolerance performance, it has been widely adopted by the internet companies. However, the performance of Hadoop will reduce significantly once it is used to handle massive small files. As a result, this paper proposes a new scheme to...
متن کاملHmfs: Efficient Support of Small Files Processing over HDFS
The storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing an...
متن کامل